Randomised Procedures for Initialising and Switching Actions in Policy Iteration

نویسندگان

  • Shivaram Kalyanakrishnan
  • Neeldhara Misra
  • Aditya Gopalan
چکیده

Policy Iteration (PI) (Howard 1960) is a classical method for computing an optimal policy for a finite Markov Decision Problem (MDP). The method is conceptually simple: starting from some initial policy, “policy improvement” is repeatedly performed to obtain progressively dominating policies, until eventually, an optimal policy is reached. Being remarkably efficient in practice, PI is often favoured over alternative approaches such as Value Iteration and Linear Programming. Unfortunately, even after several decades of study, theoretical bounds on the complexity of PI remain unsatisfactory. For an MDP with n states and k actions, Mansour and Singh (1999) bound the number of iterations taken by Howard’s PI, the canonical variant of the method, by Opkn{nq. This bound merely improves upon the trivial bound of k by a linear factor. However, a randomised variant of PI introduced by Mansour and Singh (1999) does yield an exponential improvement, with its expected number of iterations bounded by O ppp1` 2{ log2pkqq k{2qnq. With the objective of furnishing improved upper bounds for PI, we introduce two randomised procedures in this paper. Our first contribution is a routine to find a good initial policy for PI. After evaluating a number of randomly generated policies, this procedure applies a novel criterion to pick one to initialise PI. When PI is subsequently applied, we show that the expected number of policy evaluations—including both the initialisation and the improvement stages—remains bounded in expectation by Opkn{2q. The key construction employed in this routine is a total order on the set of policies. Our second contribution is a randomised action-switching rule for PI, which admits a bound of p2 ` lnpk ́ 1qqn on the expected number of iterations. To the best of our knowledge, this is the tightest complexity bound known for PI when k ě 3.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Batch-Switching Policy Iteration

Policy Iteration (PI) is a widely-used family of algorithms for computing an optimal policy for a given Markov Decision Problem (MDP). Starting with an arbitrary initial policy, PI repeatedly updates to a dominating policy until an optimal policy is found. The update step involves switching the actions corresponding to a set of “improvable” states, which are easily identified. Whereas progress ...

متن کامل

Asymmetric Effects of Monetary Policy and Business Cycles in Iran using Markov-switching Models

This paper investigates the asymmetric effects of monetary policy on economic growth over business cycles in Iran. Estimating the models using the Hamilton (1989) Markov-switching model and by employing the data for 1960-2012, the results well identify two regimes characterized as expansion and recession. Moreover, the results show that an expansionary monetary policy has a positive and statist...

متن کامل

Multisectoral Actions for Health: Challenges and Opportunities in Complex Policy Environments

Multisectoral actions for health, defined as actions undertaken by non-health sectors to protect the health of the population, are essential in the context of inter-linkages between three dimensions of sustainable development: economic, social, and environmental. These multisectoral actions can address the social and economic factors that influence the health of a population at the local, natio...

متن کامل

Methods for Pricing American Options under Regime Switching

We analyze a number of techniques for pricing American options under a regime switching stochastic process. The techniques analyzed include both explicit and implicit discretizations with the focus being on methods which are unconditionally stable. In the case of implicit methods we also compare a number of iterative procedures for solving the associated nonlinear algebraic equations. Numerical...

متن کامل

SWITCHING TEAMS ALGORITHM FOR SIZING OPTIMIZATION OF TRUSS STRUCTURES

Meta-heuristics have received increasing attention in recent years. The present article introduces a novel method in such a class that distinguishes a number of artificial search agents called players within two teams. At each iteration, the active player concerns some other players in both teams to construct its special movements and to get more score. At the end of some iterations (like quart...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016